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ADAPTIVE MULTISCALE DETECTION OF FILAMENTARY 
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RANDOM POINTS 1 
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University of California, San Diego, Stanford University and 
Georgia Institute of Technology 

We are given a set of n points that might be uniformly distributed 
in the unit square [0, l] 2 . We wish to test whether the set, although 
mostly consisting of uniformly scattered points, also contains a small 
fraction of points sampled from some (a priori unknown) curve with 
C Q -norm bounded by f). An asymptotic detection threshold exists in 
this problem; for a constant T-(a,/3) > 0, if the number of points 
sampled from the curve is smaller than T_ (a, [3)n 1/(1+a) , reliable de- 
tection is not possible for large n. We describe a multiscale significant- 
runs algorithm that can reliably detect concentration of data near a 
smooth curve, without knowing the smoothness information a or (3 
in advance, provided that the number of points on the curve exceeds 
T*(a,/3)n 1// ' 1+a \ This algorithm therefore has an optimal detection 
threshold, up to a factor T*/T_. 

At the heart of our approach is an analysis of the data by count- 
ing membership in multiscale multianisotropic strips. The strips will 
have area 2/n and exhibit a variety of lengths, orientations and 
anisotropies. The strips are partitioned into anisotropy classes; each 
class is organized as a directed graph whose vertices all are strips 
of the same anisotropy and whose edges link such strips to their 
"good continuations." The point-cloud data are reduced to counts 
that measure membership in strips. Each anisotropy graph is reduced 
to a subgraph that consist of strips with significant counts. The algo- 
rithm rejects Ho whenever some such subgraph contains a path that 
connects many consecutive significant counts. 

1. Introduction. 
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We cannot help but see faces and castles in clouds, monsters in ink-blots and 
exotic forms in random dots. Form is so central to human perception that, I am 
told, it is extremely difficult to prove something random or formless. Mae- Wan 
Ho [20], 

Suppose we have n data points Xi G [0, l] 2 which at first glance seem uni- 
formly distributed in the unit square. On cursory visual inspection, it seems 
that a suspiciously large number of the data points fall along a smooth curve. 
However, the curve on which these points lie has only been identified after 
inspection of the data. We know that the human visual system has the abil- 
ity to "hallucinate" curvilinear structure in truly random point clouds. We 
are therefore concerned about the reliability of the perceived pattern and 
wish to follow an objective procedure for testing the existence of filamentary 
structure: this procedure should reliably separate filamentary structure from 
random scatter and be computationally tractable. 

This is a prototype for various practical imaging problems that range from 
surveillance to road and streambed tracking to particle physics [1, 31, 34]. 
In all cases, the observer is looking for evidence of a filamentary structure 
in a background of heavy clutter. 

As a first attempt to formalize matters, consider the problem of testing 

HoiX/'^-Uniforn^O,!) 2 , 

versus 

Hi(a,/3) :X/~ d '(l - £ n )Uniform(0, l) 2 + e n Uniform(graph(/)), 

where / £ H61der(a,/3) is unknown. Here, for 1 < a < 2, H61der(a,/3) is the 
class of functions g : [0, 1] — ► [0, 1] with continuous derivative g' that obeys 
\g'{x) — g'(y)\ < a(3\x — y\ a ~ l ■ In words, we believe that a relatively small 
fraction e n of points lie on a smooth curve in the plane. 

1.1. 11 Connect the dots". In our previous work [5], it was shown that 
when a and (3 are fixed and known, there is a detector based on the principle 
that, under Ho, no H61der(a,/3) curve can pass through a very large number 
of points in a random point cloud. More particularly, we know that there is 
a threshold T + = T + (a,(3) such that: 

• If T < T + , we have that, with probability tending to 1, there exists a 
Holder (a, (3) curve that contains at least T ■ n l ^ l+a ^ points (out of n). 

• If T > T + , we have that, with probability tending to 1, there does not exist 
a Holder(a,/3) curve that contains more than T • n 1 ^ l+a ^ points (Xi)f =1 . 

(More concretely, if we deal with Lipschitz curves with |slope| < 1, we have 
found empirically that for moderate n ~ 1000, there will frequently be some 
Lipschitz curve that contains -^/n data points, but rarely will there be one 
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that contains more than 3y/n points.) So, if we happen to notice a curve 
passing through substantially more than T + ■ n l ^ l+a ^ points, we have a 
strong basis to reject the null hypothesis of pure randomness. Moreover, to 
within a constant factor this threshold is optimal; no sequence of tests can 
be reliable for detecting substantially fewer than T_ • n l ^ l+a ^ points, for a 
certain T_ > 0. 

Elaborating this connect-the-dots (CTD) principle leads to a formal hy- 
pothesis test based on searching for curves that contain large numbers of 
points. Let N n (A) = #{{X{\ n A} denote the measure that counts how 
many points lie in the set A. Searching for a curve that contains the maxi- 
mal number of points leads to the optimization problem 

N*(a,(3) = max{iV n (graph(/)) : / G H61der(a, /?)}, 

which searches over all H61der(a,/3) graphs and rejects Ho for values of N* 
that substantially exceed T + • n 1 ^ 1+a \ 

The CTD approach, while very instructive, does not address the concerns 
of someone who might actually be interested in performing a test on real 
data. Such concerns include: 

• Computational burden. The task of finding the largest number of points 
on a H61der(a, f3) curve seems to us to be computationally impractical 
unless a G {1, 2}. 

• Unknown a, j3. The CTD approach assumes a specific a-/3 combination. 
Instead, we desire an algorithm that works regardless of the specific values 
of aG (1,2] and (3 > 0. 

• Fragments. The CTD approach searches only for graphs that extend all 
the way across the square from x = to x = 1. Instead, one wants an 
algorithm that works even for short graphs. 

• General planar curve. The CTD approach assumes that the underlying 
curve can be parametrized as a graph. It seems important to search for 
general curves rather than just graphs — for example, curves that loop 
around in the plane. 

1.2. An adaptive multiscale approach. In this paper we describe an ap- 
proach that addresses the concerns just listed. Our proposal: 

1. Works across for a wide range of (a,j3), and only requires knowing a 
bound on the maximum slope of the curve. 

2. Detects the presence of Hi provided 

for a constant T* which depends on a, (3 and other factors. In view of 
earlier results, this is optimally sensitive to within a factor T*/T_. 
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3. Runs in 0(n 2 • log(n)) flops. 

4. Extends naturally to detect general planar curves that are not graphs. 

5. Extends naturally to detect target filaments of unknown extent that, in 
large samples, can be very short compared to the image extent. 

The detector is based on a kind of multiscale geometric analysis of the 
data set, using a multiscale dictionary of parallelogram strips that exhibit 
a variety of lengths, locations, orientations and aspect ratios. The idea is to 
count membership of data points in various strips, to identify strips with 
significantly large counts and to search for long runs of significantly large 
counts in collections of strips that are "good continuations" of each other. 

The detector is adaptive to the unknown smoothness (a,/3) in the sense 
that it achieves near-optimal performance over a wide range: 1 < a < 2, 
(3 > 0. (This notion of adaptivity parallels the notion of adaptive near- 
minimaxity in nonparametric smoothing, in which a single estimator, able to 
perform in a near-minimax way across a whole range of different smoothness 
conditions, is called adaptive to unknown smoothness [17].) Ultimately, such 
adaptivity flows from ideas behind Lemma 2.2 below, which show that our 
class of strips has certain covering properties uniformly over each smooth- 
ness class in the range 1 < a < 2. 

An interesting aspect of our approach is how simply and naturally the 
principle of good continuation appears and leads to a solution. 

Note that here we consider only the presence of the underlying curve — a 
detection problem. Another question — the estimation problem — is to locate 
the position of the curve accurately. The performance of our procedure for 
estimation will not be addressed here. 

1.3. Contents. Section 2 describes our underlying multiscale data struc- 
tures. Section 3 describes our adaptive algorithm in general terms and gives 
a statement of our main result. Section 4 describes the threshold settings 
that underlie our algorithm, while Sections 5 and 6 analyze its behavior un- 
der Ho and Hi, respectively. Section 7 finishes the paper with a discussion 
of related work. 

2. Multiscale anisotropic strips and good continuation. Our data struc- 
tures comprise a multiscale collection of anisotropic, tilted planar regions 
and a sequence of directed graphs that organize them. We use ideas and 
notation common in dyadic multiscale analysis (e.g., dyadic partitioning) 
[10, 14, 15, 16, 17, 24]; in particular, we assume that n is large and find it 
convenient to let J = [log 2 (n)] denote its dyadic logarithm. The variable j 
will index dyadic scales 2~ J and will range through < j < J. 

In our construction we fix in advance S > 1 (e.g., 2 or 4); this controls 
the maximum |slope| we will be able to detect. 
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Let R(j,k,£i,£2) be a parallelepiped with vertical sides that is w = 2~i 
wide by t = 2~( J ~i> +1 thick. Here j runs through our set of scale indices 
{0, . . . , J}. For examples, see Figure 1. The regions in question have a midline 
that bisects them vertically and will be tilted (sheared) at a variety of angles. 
Notice that these regions are highly anisotropic. While the whole collection 
implicitly depends on n, we suppress this in our notation. Moreover, the 
width w and thickness t depend on j and n, but we also suppress this in our 
notation. Note that the degree of anisotropy is the same for all regions that 
share a common value of j; we generally focus only on one anisotropy class 
j at a time. 

The parameters k and £i,i = 1,2, control the horizontal location of the 
regions and the vertical location and slope of the midline. There is an under- 
lying assumption that we are interested only in regions whose major axis has 
a slope bounded in absolute value by S. The mapping between these discrete 
parameters is intended to insure that the regions pack together horizontally 
and that they are fairly closely spaced in both vertical position and slope. 
Let <5i = £/4 and 62 = t/(Aw) (these again depend implicitly on j and n). 
The parallelepiped R(j,k,£\,£2) will be centered at c = ((k + l/2)w, £161) 
and its midline will have slope s = £2^2- Here < k < w~ l , £\ runs through 
the set 0, . . . , — 1 and £2 runs through the set — Sd^ 1 , ■ . ■ , Sd^ 1 ■ 

We gather all such regions at level (scale) j in IZ(j) = {R(j,k,£i,£2) -k, 
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To organize the regions, we define a directed graph Q{j) = (V(j), <?(j)), 
with vertices V(j) and edges £(j). The vertices are simply the regions in 
TZ(j) - V(j) =1Z(j). The edges connect regions to their good continuations, 
namely, regions that are horizontally adjacent, and that have altitudes and 
slopes that are nearly the same — less than Si and S2 apart, respectively. 
Formally, we have the directed edges in £(j), 

{k,£i,£ 2 )^{k + l,£i+£ 2 + u,£ 2 + v), 

where \u\ < 4, \v\ < 4. 

Figures 2 and 3 illustrate good continuation and bad continuation, re- 
spectively. 

This graphical structure, while very simple, has a perhaps surprising prop- 
erty: it allows us efficiently to cover the graphs of general smooth functions 
that exhibit any of a range of smoothnesses. This claim is summarized in 
three lemmas. The first lemma associates to each Holder class a specific 
anisotropy graph. 

Lemma 2.1. For each fixed (a,/3) combination with 1 < a < 2, we have 
that for all sufficiently large n there is j* = j*(a, (3; n) so that w = w(j*) and 
t = t(j*) obey 



(2.1) 



2(3w a < t < 
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Fig. 3. Examples of bad continuations; either the midlines have very different slopes or 
the sides are effectively disjoint. 



The next result shows that regions in the anisotropy class 1Z(j*(a, (5)) are 
well adapted to cover fragments of the graphs of the associated H61der(a, (3) 
class. 

Lemma 2.2. Let j = j* (a, (3) and suppose f is a Hdlder(a, (3) function 
with a domain that contains 1^ = [kw, {k + l)w). Set = (k + l/2)w, let 
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Fig. 4. Graph of f covered by its kth associated region Rk at scale j ; fk is the tangent 
to graph(/) at Xk and gt is the midline of Rk. 

t\ k^i be the closest multiple of 5\ to f{x k ) and let £2^2 be the closest 
multiple of 82 to f'(xk)- We say that the region R(j,k,£i t k,£2,k) is associated 
to f on Ik ■ This strip covers the graph of f over their common domain: 

(2.2) graph(/|J fc )cJ?(j\Mi, fcj M. 

(See Figure 4.) 

The final lemma in the sequence shows that every function in the H61der(a, (3) 
class corresponds to a covering sequence of regions that makes a connected 
path in Q(j). 

Lemma 2.3. Let j =j*(a,j3) and suppose f is a Hdlder(a, (3) function 
on [0, 1] . For each k = 0, . . . , w~ l — 1, consider the region Rk = R(j, k, £i,k^2,k) 
associated to f by the procedure mentioned in Lemma 2.2. The sequence of 
strips Tj(f) = {Rk '■ < k < w" 1 } consists of spatially adjacent regions, mak- 
ing a kind of tube. When viewed as vertices ofQ(j), the (Rk) are neighbors in 
Q(j), that is, Rk and Rk+i can be connected using edges in S(j). Therefore, 
Tj(f) corresponds to a path in Q(j). 
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Fig. 5. Graph of f covered by its associated tube Tj(f) at scale j. 



These lemmas together show that, while the graphical structure itself is 
based on very simple rules, it is able to associate paths in the graph with 
custom-fitting tubes that cover the graphs of very different kinds of smooth 
functions. The proofs of these lemmas are given in the Appendix. Figure 5 
illustrates the idea. 

3. The multiscale significant-runs algorithm. We now describe the com- 
plete algorithm for analysis of point-cloud data (Xj) looking for suspected 
curvilinear structure. It depends on a counting threshold N* and a length 
threshold L* , both to be defined later. The algorithm has several steps: 

1. Counting membership in anisotropic strips. For every region R, in every 
anisotropy class, we count the number of data that fall into that region, 

N(R) = #{i:X l £R}. 

2. Identifying significant counts. We define a significance indicator, which is 
nonzero when the counts exceed a threshold, 



s(R) = 1{N{R)>N*}- 
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The significance indicator may be viewed as a label on the regions R, 
producing a sequence of a labeled graphs 

£(j) = (V(j),£(i),a(i)), 

where o~(j) = (s(R)) gives the labels on R € TZ(j)- We call this the jth 
significance graph. 

3. Computing longest paths. In each significance graph, we employ a depth- 
first search algorithm to explore all significant paths 

7T = (i?i,i?2, • ■ • , Rm), 

that is, sequences of vertices that are: 

(a) all significant, s(Rk) = 1; 

(b) all connected, (Rk, Rk+i) G £{])■ 

We record the maximum path length in each significance graph: 
L™^ x = max{length(7r) : it is a significant path in 

ITTclX L^i j ' 

j 

4. Decision. We compare L™ ax with a length threshold: 

If L™ ax < L* n , accept H ; if L™ ax > L*, reject H . 

This defines the test, except for the specification of the thresholds L* 
and N* . Asymptotic formulas for these thresholds will be given in Section 4. 

A worked example of the multiscale significant-runs algorithm is illus- 
trated in Figure 6. For this example the synthetic point cloud is mostly uni- 
formly distributed, with fraction e ~ 1 /20 of points lying on a fixed smooth 
curve. The significance threshold was A^* = 8. For this choice of threshold, 
we have (under H ) P{N(R) > N*} « 0.00024. 

The longest run in this example has length 5, which in this case exceeds 
the run-length threshold L* and leads to rejection of the null hypothesis. For 
this small-sample setting, the threshold L* = 3 was obtained by simulation 
rather than asymptotic theory. (Under the null hypothesis, we conducted 
1200 experiments. In each experiment, a point cloud was generated and 
L™ x was computed. The frequencies of L™ ax = 1,2,3,4 and 5 were 302, 
873, 24, and 1, respectively. Based on these results, L* can be set at 
either 2 or 3, giving a test with empirical level P{L™ ax > 2} = 25/1200 or 
P{L™ ax > 3} = 1/1200, resp.) 

Asymptotic theory gives L* ~ 3.74, which leads to the same decision. 

For comparison, Figure 7 gives a simulated example in the null case e = 0; 
the longest run has length 2 in this case. This simulation was typical; in the 
null example, rarely does the longest run exceed 2. Several properties of the 
algorithm are immediate: 
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Fig. 6. A uniform random scatter contaminated by e = 1/20 points on a curve, together 
with the identified significant run (consisting of five strips); n — 1024. 
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FlG. 7. A uniform random scatter, together with the identified longest run of significant 
strips. At length 2 it is not a significant run; n = 1024. 



• Complexity of strip counts. The algorithm calculates all the N(R) for all 
the anisotropic strips. This takes 0(n 2 log(ri)) flops, where n is the num- 
ber of points sampled. Indeed, since each data point can belong to order 
0(relog(n)) strips, by simply calculating which R3 X{ and incrementing 
a counter for those R, we get all N(R). 

• Complexity of longest path. The algorithm calculates the longest path in 
each significance graph. This takes work comparable to 0(nlog(n)), based 
on depth- first search [2]. 

• Storage requirement. The algorithm stores all of the N(R) counts and all 
the significance coefficients. This requires 0(nlog(n)) storage. 

To state our main result, we amend our notion of alternative hypoth- 
esis. For a > 1, let H61der(a, (3, S) denote the collection of functions / G 
H61der(a,/3) with \ f'\ 00 < S. Define 



Hi(a,/?, S, t) :Xi'^"(l — e n )Uniform(0, l) 2 + e n Uniform(graph(/)), 

/ in H61der(a, >T-n~ a ^ l+a \ 
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Theorem 3.1. There is a single choice of thresholds A* arid (Zy^) n so 
that for every a 6 (1,2] and /3 > 0, there is T*(a, (3, S) with, for each r > T*, 

P{test rejects Ho|Hi(a, /?, 5, r)} — > 1 as n — > oo; 

oi the same time 

P{test rejects Ho|Ho} — > as n — > oo. 

In words, under Ho the longest significant path is overwhelmingly unlikely 
to be substantially longer than L* for large n, while under each indicated 
Hi the longest significant path is overwhelmingly likely to be substantially 
longer than L* . The threshold T* is within a constant factor T*/T_ of the 
optimal detection threshold; this shows that, up to constants, we can adap- 
tively test for the existence of fragments of C a graphs. 

4. Asymptotics of thresholds. For practical finite-sample application of 
the test just described, it is of course possible to calibrate thresholds by con- 
ducting simulation experiments. For the proof of our main result, we specify 
thresholds in closed form. These thresholds are very conservative and our 
closed- form analysis yields vastly overstated estimates of error probabilities, 
which are nevertheless good enough for our proofs. 

4.1. Specification of N* . Fix a probability po < 1. Define a counting 
threshold A + (e,A) with the property that 

P{Poisson(A) > A+} < e, 

where Poisson(A) denotes a Poisson random variable with parameter A. Let 
Bin(ra,p) denote a binomial random variable with parameters n and p. By 
Poisson approximation to the binomial, we have 

P{Bin(n,A/n) >N + (e,X)} <2e, n>n . 

Set then N* = N + (po /162, 2). Our definition of N* gives us the key property 

(4.1) P{s( J R) = l|H } = P{Bin(n,2/n) >iV*}<p /81, n>n . 

4.2. Specification of 'L* . We use a convenient, but nonstandard, notation 
borrowed from Arratia and Waterman [6]. For <p < 1, log„ denotes, in 
this unconventional notation, the logarithm with base 1/p. At the same time, 
we maintain the original convention that if b > 1, log fe means log to the base 
b. By this convention, log 2 is the traditional logarithm base two and log^ 
is actually the same quantity. 

Define the length threshold 

(4.2) L*=3Io gpo (n), 

where po G (0, 1) is the same as in the specification of A^*. The underlying 
rationale for this choice is the Erdos-Renyi law (see [6]), which says: 
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In a sequence ofm i.i.d. Bernoulli random variables with probability p of heads, 
the length of the longest run of pure heads ~log p (m)(l + op(l)). 

In effect, our definition makes L* very substantially longer than the length 
of the longest run of pure heads in a linear sequence of 0(n 1 /( 1+a ^) coin 
tosses of a po coin. 



4.3. Specification of T*. Associated to the parameters N* and L* will 
be the threshold T* at which Hi becomes detectable. To define that, set p\ 
sufficiently close to 1 so that, for all a € (1,2] and for some n\ > 0, 

(4.3) log pi (n 1 /(i+«))>2-L;, n>m 
1 /18 

(pi = p works). Implicitly, this choice again refers to the Erdos-Renyi 
law and, in some way, guarantees that under Hi there will be very long 
runs. 

Define an intensity threshold A + (e) with the property that 
P{Poisson(A+(e)) < N*} < e. 
Set A* = A + ((l —pi)/2) and set 

T, (a, 13, S) = 2X* ■ . ^1 + 52, 

We will show that for 

(4.4) s n > T* ■ 

the hypothesis Hi(a,0) becomes detectable by our proposal. The central 
point will be the property 



(4.5) n- e n - w(j*{a,,8,n))/yl + S 2 > A*. 

To prove this, we inspect the proof of Lemma 2.1 in the Appendix and 
note that, by definition of T* (a, J3, S) above and using notation from the 
Appendix, 



n-e n - w(j*(a,8,n))/yT+ S 2 



> n V(i+«) . T)f ■ w(j + (a, 3, n))/(2 v / TT52 ) = A*. 

Incidentally, we do not claim that the algorithm fails for r < T* , only that 
we can prove it succeeds for r > T* . 
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5. Behavior under Hi. Let / be the function in Holder (a, /?) that gen- 
erates the curve that carries the fraction e n of data. Prom Lemmas 2.1-2.3, 
we let j = j* and consider the tube Tj(f). For each region R in this tube, 

N(R) ~ Bin(n, (1 - e n )area( J R) + e n7 (/, R)), 

where area(i?) = 2/n and 7 denotes the relative arc length in the graph of 
/, obeying 

length(graph(/|J)) w 
7U ' ' length(graph(/)) " s/TTS^' 
here I denotes the projection of R on the x-axis. 
By Poisson approximation to the binomial, 



AT( j R) ap Pi ox -p isson( M ), 



where, using (4.5), 



H > 1 + ne n w/\Jl + S 2 
> A*. 

Hence, for every R in this tube, we have, for all sufficiently large n > n.3, 

(5.1) P{N(R)>N*}> Pl . 

Label the sequence of strips R in this tube Rq, . . . , R w -i_i- We want to 
know the probable length of the longest run of the form 

N(R k )>N*,...,N(R k+L )>N*. 

Then, if L n is the length of the longest run in this special sequence, it follows 
that the longest run statistic we are computing over the entire graph can 
only be larger: 

j max \ r 
L nj - L «- 

To show that the test rejects Ho, we will show that 

(5.2) P{L n >L* n }^l, n^oo. 
If we define 

we note that each Zi is Bernoulli with probability pi, while 

Pi > Pi- 

We let m = w~ 1 > Const • n 1// ^ 1+Q ^ and p = pi, and we get, by the Erdos- 
Renyi law, 

L n >log pi (n 1 /d+«))(l + 0p (l)). 
However, by hypothesis we have chosen p\ in such a way that 

log pi (n 1 /(i+-))>2L;. 

Hence (5.2) follows. 
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6. Behavior under Ho. We need to show that, with overwhelming prob- 
ability under Ho, there will be no runs in the graph that exceed L*. We 
start by arguing that 

P{& significant path of length L starts at given i?|Ho} <Po ■ 

Indeed, by choice of N*, for each R, 

P{s(R) = 1|H } = P{Bin(n, 2/n) > N*} < p /81. 

Now each region R has 81 neighbors in G(j), and so by Boole's inequality, 

P{s(R') = 1 for at least one neighbor of i?|Ho} 

< £ p{s(R') = m } 

R' £ neighbors (R) 

< # neighbors • (po/81) 
= 81-(po/81)=po. 

Now if we are looking for a significant path of length greater than L, we need 
that starting at some vertex, it has s(R) = 1 and is connected to a vertex 
with s(R') = 1, et cetera. For a given starting point R, the probability of 
this event is bounded by p$ via negative correlation. 

We now note there are at most Mj = w~ 1 ■ S^ 1 ■ S^ 1 • 2S starting points 
for paths in Q{j). By Boole's inequality, 

Pjthere is a significant path of length L occurring in C/(j)|Ho} 

< # (starting points at level j) 

x Pjsignificant path of length L starting at i?|Ho} 

<M r p%. 

Take log 2 : 

log^Mj) + log 2 (p ) • L = log 2 (ur 1 • Sf 1 • S^ 1 ■ 2S) + log 2 (p ) ■ L 

= 2 J - 2j + 3 + log 2 (5) + log 2 (p ) • L 

<2J + C + \og 2 {p )-L. 

For L = L* , the last expression on the right-hand side tends to — oo as n 
increases. Hence 

Pjthere is a run of length L* |Ho} — > 0, n — > oo. 

7. Discussion. We have given only a sampling of results in a specific 
problem of geometric detection; much more could be done. At the same 
time, our results are closely related to many ideas in the literature of com- 
puter vision. We briefly indicate possible variations and sketch a few such 
connections. 
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7.1. Variations on the filament model. We have only discussed a subset 
of what could pass for filamentary structure in point-cloud data. Other no- 
tions of filamentarity include (a) curves that have less regularity than one 
derivative, (b) curves that have more regularity than two derivatives and (c) 
curves that are not describable as simple graphs (x,y = f(x)). Our aim in 
this paper is to stimulate discussion; we believe that all such generalizations 
will be of interest in appropriate applications areas. We make a few brief 
remarks. 

• Curves that have regularity a < 1. If we consider curves (x,f(x)), where 
/ is H61der-a with a < 1, we are considering curves without tangents. 
We thus discard completely the notion of good continuation based on 
alignment of tangents. In designing graph-based detectors, it is only re- 
quired to use axis-oriented rectangular regions, so that only position (not 
slope) matters, and in which the rectangles are now taller than they are 
wide; Connectivity involves only position (not orientation). The statistical 
treatment based on graphs and runs turns out to be the same; the graphs 
simply have less structure because proximity does not involve similarity 
of slopes. 

• Curves that have regularity a > 2. It makes sense to ask about smoothness 
of higher order, for example, to consider 2 < a < 3. By [5], there will 
continue to be about n 1 /( 1+a ) points on some Holder curve, even for a 
in this range. To sensitively detect such higher-order smoothness would 
require a different set of regions than the one discussed here — curved 
ones with parabolic midlines — and a notion of good continuation based 
on matching of sides, and matching of slopes and curvatures of midlines. 
Preliminary calculations suggest that analogous "adaptivity" results hold 
in such a setting. Related discussion can be found in [3, 5]. 

• Curves that are not graphs. The Introduction suggested that the approach 
described here can be adapted to detect general plane curves, that is, 
curves that are not graphs. This adapts ideas from our work on beam- 
let graphs [4, 16]. We define a family of directed graphs based on regions 
modeled on "dyadically thickened beamlets" with various degrees of thick- 
ening. In this directed graph structure, strips can have all orientations, 
including vertical and horizontal, so that the graph constraint is removed. 
Connectivity between beamlets is based once more on good continuation 
principles — in this case, continuation of plane polygons rather than polyg- 
onal graphs. Otherwise the algorithms are identical. More details can be 
found in [5], where this structure is utilized to prove a theoretical result. 

Still other variations on our model are possible. The idea that data are 
uniformly sampled from a curve of zero width can be varied in several ways: 

• Nonuniform sampling along filaments. A referee suggested that rather 
than from a uniform density, data might arise instead from a density that 
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is bounded away from zero and infinity. The same methodology developed 
here works in that case without change, except that the analysis under 
Hi seemingly becomes more involved. 

• Finite thickness. A referee also suggested that rather than from a curve of 
zero thickness, the data might arise instead from a tube of finite thickness. 
The methodology developed here works without change in such a case, but 
the model Hi is different and the statement of results becomes different. 
Thus, if the thickness of the tube is finite, but smaller than the width of 
the regions that are adapted to the underlying filament, the detectability 
results are the same as here. If the width is greater than that of regions 
adapted to the curve, then the detection threshold becomes higher and it 
takes systematically larger numbers of points near a curve to reject Ho- 

• Finite resolution. A referee also suggested that the data might be of finite 
accuracy, for example, either subject to rounding or to noise. In either case, 
the situation is much the same as in the immediately preceding comment. 
If the inaccuracy is small compared to the width of the optimally fitted 
regions, its impact is negligible. If the inaccuracy is larger than that width, 
then the detection threshold becomes higher and rejection of Ho requires 
systematically larger numbers of points near a curve. 

We leave exploration of such issues for future work. 

7.2. Beyond detection. A referee pointed out the desirability of not just 
detecting the presence of filamentary structures, but also estimating the de- 
tailed location and shape of any filaments that are detected. Of course, con- 
ceptually detection and estimation are quite different tasks; moreover, unless 
detection is possible, estimation is impossible. Empirically, our methodol- 
ogy actually provides an estimator when the filament is not just detected, 
but strongly detected. As the number of points sampled from the filament 
increases well beyond the detection threshold, simulations show that the 
longest run becomes overwhelmingly likely to trace out a series of regions 
that bracket the curve tightly. It seems likely that this effect could be proven 
rigorously. However, it seems rather delicate to formulate and study an ap- 
propriate notion of asymptotically efficient estimation. 

7.3. Extensions beyond the filament model. We can extend beyond the 
setting of two-dimensional point clouds in at least three ways: going to higher 
dimensions, observing vectors rather than points and observing pixel im- 
agery on a grid rather than scattered points. 

7.3.1. Structures in higher dimensions. The analogous detection prob- 
lem in d-dimensional space — finding a curve or surface that contains an 
unexpectedly large number of points — has been considered by the authors 



ADAPTIVE MULTISCALE DETECTION OF FILAMENTARY STRUCTURES 19 



in [3, 5]. That work provides nonadaptive detectors, that is, detectors that 
assume knowledge of the Holder class. 

The ideas developed in this paper can be directly applied to multiscale 
detection of filamentary structure in (i-dimensional point clouds, d > 2. A 
more ambitious generalization — detection of co dimension- k surfaces in d- 
dimensional point clouds — seems possible, but also messier. For example, 
for d = 3 and k = 1 , we are attempting to find a surface in (i-dimensional 
space that contains an inordinately large number of points. The nodes of 
each anisotropy graph are planar slabs of volume 2/n and the neighborhood 
structure in the graph is, while conceptually analogous to the case considered 
here, far more complex to write out. The comparable adaptive detection 
theorems hold in that setting, although we omit details. 

7.3.2. Vector fields. Suppose that instead of data on points Xi, we have 
data on tangent vectors, that is, on pairs that name both a position 

and a direction. Experiments in perceptual psychophysics (e.g., [19, 28]) sug- 
gest that this is a much more potent stimulus to "curve finding" than simply 
the display of random dots. Biological evidence about early vision suggests 
that individual receptive fields fire when both location and orientation offer 
matches. 

As a null hypothesis, we suppose that the Xi's are i.i.d. uniform [0, 1] 2 , 
while the O^s are i.i.d. uniform [— tt/2, vr/2]. As an alternative hypothesis, we 
could posit that a small fraction of the Xi lie on a curve, Xi = (xi, f(xi)), 
and the 0i specify angles parallel to the line with slope f'(xi). 

Our paper [5] showed that such tangent vector data are substantially more 
powerful for identifying filamentary structure than the point data discussed 
so far in this paper. Namely, such data give us the ability to detect filaments 
that contain much smaller fractions of data points. In fact, a reliable test 
for a H61der(2, 1) filament can be based on agreement with more than Tn 1 ^ 
tangent vectors, rather than Tn 1 / 3 points. 

The multiscale data structures used in this paper can be applied in the 
tangent vector setting, where we match (Xi, 0i) to a region based not only on 
Xi € R' , but also on 6i matching the slope of the midline of R. The resultant 
algorithm can take advantage of this more stringent matching structure 
to speed up the counting process, because each tangent vector will lie in 
only one region in a given anisotropy class, resulting in 0(log(n)) flops per 
data point rather than 0(n log(n)). Searching for significant paths will be 
significantly faster as well, since there can only be nlog(n) starting places for 
a path. Hence the whole algorithm can run in 0(nlog(n)) flops. An analysis 
that parallels the one given here shows that a multiscale multianisotropic 
significance runs algorithm can provide a detection threshold that is optimal 
to within a constant factor. Huo and co-workers [23] gave more details from 
the computational aspect of this problem. 
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7.3.3. Pixel imagery. In another direction, we might consider data types 
used to model digital imagery, for example, arrays (2/(21,22) :0 < 21,22 < n), 
where 

Viite =£f{h,h) +0-2(21,22), < 21, 2 2 < n, 

(2(21,22)) is a Gaussian white noise, a is the noise level and £j is a pixel 
array with nonzero values only on pixels that intersect graph(/). Despite 
appearances, this problem is closely related to the present problem and 
analogous detection theorems are true. 

In effect, we form a family of anisotropic multiscale strips and sum the 
pixels that intersect those strips, producing detector statistics X(R) that can 
be significance-tested in a way that parallels the counts N(R) considered in 
this paper, only with Gaussian rather than Poisson threshold analysis. The 
underlying data structures and arguments can be understood as, roughly 
speaking, a mixture of the ideas of this paper and those of another paper 
by these authors [4]. In particular, the strips considered here were called 
axoids in that work. More details can be found in [21, 22]. In the digital 
array setting, there is a fast beamlet transform to rapidly compute all the 
required X(R) detector statistics. 

7.4. On the uniform background assumption. A referee remarked that 
many practical imaging problems do not involve small departures from a 
uniform background. This is no doubt true. However, there is an everex- 
panding array of imaging problems, and some look for changes between one 
image and another, for example, in studying arterial blood flow or in change 
detection in scene surveillance. In the case of no change, the background is 
quite literally uniform. Also, many problems with nonuniform background 
are transformable to uniform background; thus edge detectors and object 
detectors are typically operated at so-called constant false alarm rate. Then 
they give, under the null hypothesis of no object present, roughly a con- 
stant number of events per unit area, and so the constant false alarm rate 
transformation forces a uniform background under Ho- Finally, the intellec- 
tually important issues explored here seem clearest in the uniform case and 
the data structures developed here are known to be useful in more complex 
settings [4]. 

The main point, however, is well taken — detection of objects against 
nonuniform clutter rather than uniform scatter remains a challenging area 
for further work. 

7.5. Relationships to other work. 
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7.5.1. Parametric detection. In this paper we considered detection of 
points on nonparametric curves. In certain cases, one is interested in points 
along lines [8] or on parabolas [1] . For an attractive nonmultiscale approach 
to such detection problems, see [11, 12, 13] and [33], Section 6.3. 

In the authors' paper [4], which was just mentioned, it was shown how to 
develop multiscale geometric detectors for line segments in digital imagery, 
for parametric forms such as circles, rectangles and ellipses. Those ideas 
could be adapted to the present setting to find situations where data have 
an elevated density over a blob or along some line. In the end, the underlying 
computations involve dyadic multiscale rectangles and strips, and the ideas 
are closely related to those in this paper. 

7.5.2. Multiscale geometric analysis. The tools described here are closely 
related to a variety of tools in multiscale methods; see [10, 15, 16, 18, 24, 27, 
32] for discussions of related tools applied in image analysis and in mathe- 
matical analysis. We differ here in our use of a multiscale multianisotropy 
collection of analyzing regions that is organized and exploited in a specific 
way and for a specific purpose. 

7.5.3. Object grouping and neural architecture. In the literature of com- 
puter vision, there is extensive discussion of object grouping in perception 
[7] . The problem considered here — recognizing a curve against a background 
of random points — fits in this tradition. There are even experiments in psy- 
chophysics that test the ability of the human visual system to accomplish 
similar tasks [19, 25]. In effect, what we have discussed here — (near-) optimal 
detection — corresponds to what psychophysicists call "the ideal observer" 
[26]. In this connection, we have exhibited a simple multiscale architecture 
that can provide a near-optimal detector for a very wide range of stimuli — 
curves of any of a wide range of degrees of smoothness. 

In the Gestalt theory of perception, there arises the concept of good con- 
tinuation [19, 35]; experiments show that the visual system will respond 
better to curvilinear stimuli that follow a good continuation of an initial 
pattern [30]. Here we have operationalized this principle by a regular con- 
nectivity pattern in a specific graphical structure. We have shown that with 
this implementation, we get a near-optimal detector, thus validating the 
significance of such a good continuation principle. Note well that the con- 
nectivity pattern is invariant; that is, it applies the same way at all nodes 
of the graph. 

This architectural simplicity is striking when compared with the vast spec- 
ulative literature that proposes "neural architectures" for visual perception. 
What we have shown is that by starting from a large collection of elements 
that are sensitive to (i.e., accumulate counts in) receptive fields at a vari- 
ety of lengths, widths, orientations and locations, and then connecting such 
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elements to other elements by a simple invariant rule, one very sensitively 
recognizes the existence of curvilinear stimuli simply by the existence of long 
connected paths. 

Perhaps this bears comparison with biological evidence. Functional mag- 
netic resonance imaging studies [9] suggest that there are centers in primate 
brains that seem responsible for integrating local information into recog- 
nition of long curvilinear structures [29]. It would be interesting to know 
whether such integration has any resemblance to the simple multiscale con- 
nection mechanism employed here. 

APPENDIX 

Proof of Lemma 2.1. Extend notation so that w(j) = 2~ J can have 
both real and integer arguments, and t(j) = 2~( J ~^ +1 as well. Let j + = 
j + (a,(3,n) satisfy 

2pw(j+r=t(j+), 

that is, 

2(32- aj+ =2~( J ~ i+ ) +1 . 

Let j* = be the next larger integer, so that w{j + )/2 < w(J*) < w(j + ) 
and t(j + ) < t(f) < 2t(j + ). Then because 1 < a < 2, 2 a < 4 and so 

2(3w(j*) a < 2/3w(j + ) a = t(j + ) < t(f) 

< 2 ■ t(j + ) = Af3w(j + ) a < I6f3w(j*) a . 

Now let w = w(j*) and t = t(j*), and substitute in the last display, getting 
(2.1). □ 

Proof of Lemma 2.2. Remember that / G H61der(a,/3) satisfies /: [0, 
1] -> [0, 1] and 

\f'(x)-f'(y)\<a(3\x-y\ a - 1 . 
We saw that this implies 

(A.l) \f(x) - f(y) - f'(y)(x - y)\ < (3\x - y\ a , x,y G [0, 1]. 

To prove (2.2), we use the notation fk(x) for the affine function tangent 
to graph(/) at x^. Figure 4 illustrates our notation. 
Using (A.l), with denoting [kw, (k + l)w), 

\f(x)-f k (x)\<(3(w/2) a <t/4, xel k . 
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We also note that if g k (x) = l\5i + l 2 5 2 (x — x k ), then 

\fk(x) -9k{x)\ < \}{x k ) ~ hh\ + \f{x k ) - h8 2 \\x-x k \ 

< 5 1 /2 + 5 2 /2 x w/2 

< i/8 + i/16. 

We conclude that 

(A.2) \f(x)-g k (x)\<t/2, xel k . 

On the other hand, the region R(j,k,£i,£ 2 ) has g k {x) as its midline and is 
of half-height t/2. The desired relationship (2.2) follows. □ 

Proof of Lemma 2.3. We use the same notation as in the proof of 
Lemma 2.2, as illustrated in Figure 4. 
It is enough to show that 

(A. 3) \g k+1 (x k+1 ) - g k (x k+1 )\ < t 

and 

(A.4) \g k+1 (x k+1 ) - g k (x k+1 )\ < t/w. 

It then follows that there is an edge in £(j) that connects R(j, k,l\,l<i) to 
R(j,k + l,£i,£' 2 ), where £[ and £' 2 are the values associated to g k +i- 

The following inequalities flow either from Holder conditions or from sim- 
ple rounding involved in quantization: 

\g k+ i{x k+ i) - f(x k+1 )\ < Si/2 = t/8, 
\f(x k+ i) - f k (x k+1 )\ < (3w a <t/2, 
\f k {x k+l ) - g k {x k+l )\ < ffi/2 + 5 2 /2 ■ w = t/4. 
Combining these with the triangle inequality yields (A. 3). Similarly, 
\g' k+1 (x k+1 ) - f'(x k+1 )\ < 5 2 /2 = t/(8w), 
\f'(x k+1 ) - f' k (x k+ i)\ < af3w a ~ l < t/(2w), 
\fk(xk+i) -g' k {x k+1 )\ < 5 2 /2 = t/(8w); 
combining these identities gives (A.4). □ 

Acknowledgments. We would like to thank Emmanuel Candes, Hagit 
Hel-Or, Aapo Hyvarinen, Jean-Luc Starck and Brian Wandell for helpful 
discussions and references. 



24 E. ARIAS-CASTRO, D. L. DONOHO AND X. HUO 

REFERENCES 

[1] Abramowicz, H., Horn, D., Naftaly, U. and Sahar-Pikielny, C. (1997). An 
orientation-selective neural network for pattern identification in particle detec- 
tors. In Advances in Neural Information Processing Systems 9 (M. Mozer, M. I. 
Jordan and T. Petsche, eds.) 925-931. MIT Press, Cambridge, MA. 

[2] Alio, A. V., Hopcroft, J. E. and Ullman, J. D. (1983). Data Structures and 
Algorithms. Addison- Wesley, Reading, MA. MR0666695 

[3] Arias-Castro, E. (2004). Graphical structures for geometric detection. Ph.D. dis- 
sertation, Stanford Univ. 

[4] Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of 
geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 
2402-2425. 

[5] Arias-Castro, E., Donoho, D. L., Huo, X. and Tovey, C. (2005). Connect-the- 

dots: How many random points can a regular curve pass through? Adv. in Appl. 

Probab. 37 571-603. MR2156550 
[6] Arratia, R. and Waterman, M. S. (1989). The Erdos-Renyi strong law for pattern 

matching with a given proportion of mismatches. Ann. Probab. 17 1152-1169. 

MR1009450 

[7] Buhmann, J. M., Malik, J. and Perona, P. (1999). Image recognition: Visual 
grouping, recognition, and learning. Proc. Natl. Acad. Sci. USA 96 14,203- 
14,204. 

[8] Copeland, A. C, Ravichandran, G. and Trivedi, M. M. (1995). Localized Radon 

transform-based detection of ship wakes in SAR images. IEEE Trans. Geoscience 

and Remote Sensing 33 35-45. 
[9] Courtney, S. M. and Ungerleider, L. G. (1997). What fMRI has taught us about 

human vision. Current Opinion in Neurobiology 7 554-561. 
[10] David, G. and Semmes, S. (1993). Analysis of and on Uniformly Rectifiable Sets. 

Amer. Math. Soc, Providence, RI. MR1251061 
[11] Desolneux, A., Moisan, L. and Morel, J.-M. (2000). Meaningful alignments. 

Internat. J. Computer Vision 40 7-23. 
[12] Desolneux, A., Moisan, L. and Morel, J.-M. (2003). A grouping principle and 

four applications. IEEE Trans. Pattern Analysis and Machine Intelligence 25 

508-513. 

[13] Desolneux, A., Moisan, L. and Morel, J.-M. (2003). Maximal meaningful events 
and applications to image analysis. Ann. Statist. 31 1822-1851. MR2036391 

[14] Donoho, D. L. (1997). CART and best-ortho-basis: A connection. Ann. Statist. 25 
1870-1911. MR1474073 

[15] Donoho, D. L. (1999). Wedgelets: Nearly minimax estimation of edges. Ann. Statist. 
27 859-897. MR1724034 

[16] Donoho, D. L. and Huo, X. (2002). Beamlets and multiscale image analysis. In 
Multiscale and Multiresolution Methods. Lecture Notes Comput. Sci. Eng. 20 
149-196. Springer, Berlin. MR1928566 

[17] Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness 
via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200-1224. MR1379464 

[18] Donoho, D. L. and Levi, O. (2004). Fast X-ray and beamlet transforms for three- 
dimensional data. In Modern Signal Processing (D. N. Rockmore and D. M. 
Healy, Jr., eds.) 79-116. Cambridge Univ. Press. MR2075950 

[19] Field, D., Hayes, A. and Hess, R. (1993). Contour integration by the human visual 
system: Evidence for a local "association field." Vision Research 33 173-193. 



ADAPTIVE MULTISCALE DETECTION OF FILAMENTARY STRUCTURES 25 



[20] Ho, M.-W. (2004). In search of the sublime. Institute of Science in Society. Available 

at www.i-sis.org.uk/sublime.php. 
[21] Huo, X., Chen, J. and Donoho, D. L. (2003). Multiscale detection of filamentary 

features in image data. In Wavelets: Applications in Signal and Image Processing 

X (M. A. Unser, A. Aldroubi and A. F. Laine, eds.) 592-606. SPIE, Bellingham, 

WA. 

[22] Huo, X., Chen, J. and Donoho, D. L. (2003). Multiscale significance run: Realizing 
the "most powerful" detection in noisy images. In Proc. Thirty Seventh Asilomar 
Conference on Signals, Systems, and Computers 1 321-326. IEEE, Piscataway, 
NJ. 

[23] Huo, X., Donoho, D. L., Tovey, C. and Arias-Castro, E. (2004). Dynamic pro- 
gramming methods for "connecting the dots" in scattered point sets. Technical 
report, Dept. Statistics, Stanford Univ. 

[24] Jones, P. W. (1990). Rectifiable sets and the traveling salesman problem. Invent. 
Math. 102 1-15. MR1069238 

[25] Kovacs, I. and J ULESZ, B. (1993). A closed curve is much more than an incomplete 
one: Effect of closure in figure-ground segementation. Proc. Natl. Acad. Sci. USA 
90 7495-7497. 

[26] Legge, G. E., Kersten, D. and Burgess, A. E. (1987). Contrast discrimination 

in noise. J. Opt. Soc. Amer. A 4 391-404. 
[27] Lerman, G. (2003). Quantifying curvelike structures of measures by using L2 Jones 

quantities. Comm. Pure Appl. Math. 56 1294-1365. MR1980856 
[28] Levi, D. M. and Klein, S. A. (2000). Seeing circles: What limits shape perception? 

Vision Research 40 2329-2339. 
[29] Mendola, J. D., Dale, A. M., Fischl, B., Liu, A. K. and Tootell, R. B. H. 

(1999). The representation of illusory and real contours in human cortical visual 

areas revealed by functional magnetic resonance imaging. J. Neuroscience 19 



[30] Pizlo, Z., Salach-Golyska, M. and Rosenfeld, A. (1997). Curve detection in a 
noisy image. Vision Research 37 1217-1241. 

[31] Qaddoumi, N., Ranu, E., McColskey, J. D., Mirshahi, R. and Zoughi, R. 

(2000). Microwave detection of stress-induced fatigue cracks in steel and poten- 
tial for crack opening determination. Research in Nondestructive Evaluation 12 
87-103. 

[32] Sharon, E., Brandt, A. and Basri, R. (2000). Fast multiscale image segmentation. 

In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1 70- 
77. 

[33] Small, C. G. (1996). The Statistical Theory of Shape. Springer, Berlin. MR1418639 
[34] Tupin, F., Maitre, H., Mangin, J.-F., Nicolas, J.-M. and Pechersky, E. 



(1998). Detection of linear features in SAR images: Application to road net- 
work extraction. IEEE Trans. Geoscience and Remote Sensing 36 434-453. 



[35] Wertheimer, M. (1938). Laws of Organization in Perceptual Forms. Harcourt Brace, 
London. 



8560-8572. 



E. Arias-Castro 

Department of Mathematics 

University of California, San Diego 

9500 Gilman Drive 

La Jolla, California 92093-0112 

USA 

E-MAIL: eariasca@math.ucsd.cdu 



d. l. donoho 

Department of Statistics 

Stanford University 

Stanford, California 94305-4065 

USA 

E-MAIL: donoho@stat.stanford.edu 



E. ARIAS-CASTRO, D. L. DONOHO AND X. HUO 



X. Huo 

School of Industrial 

and Systems Engineering 
Georgia Institute of Technology 
Atlanta, Georgia 30332-0205 
USA 

E-MAIL: xiaoming@isye.gatech.cdu 



